Input Sentence Splitting and Translating
نویسندگان
چکیده
We propose a method to split and translate input sentences for speech translation in order to overcome the long sentence problem. This approach is based on three criteria used to judge the goodness of translation results. The criteria utilize the output of an MT system only and assumes neither a particular language nor a particular MT approach. In an experiment with an EBMT system, in which prior methods cannot work or work badly, the proposed split-and-translate method achieves much better results in translation quality.
منابع مشابه
Splitting Long Input Sentences for Phrase-based Statistical Machine Translation
Translation results suffer when a standard phrasebased statistical machine translation system is used for translating long sentences. The translation output will not have the same word order as the source. When a sentence is long, it should be partitioned into several clauses, and the word reordering during the translation done within these clauses, not between the clauses. In this paper, we pr...
متن کاملSplitting Input Sentence for Machine Translation Using Language Model with Sentence Similarity
In order to boost the translation quality of corpus-based MT systems for speech translation, the technique of splitting an input sentence appears promising. In previous research, many methods used N-gram clues to split sentences. In this paper, to supplement N-gram based splitting methods, we introduce another clue using sentence similarity based on edit-distance. In our splitting method, we ge...
متن کاملDescription for IWSLT 2010
Our submission is a non-structural Example-Based Machine Translation system that translates text from Arabic to English, using a parallel corpus aligned at the paragraph / sentence level. Each new input sentence is fragmented into phrases and those phrases are matched to example patterns, using various levels of morphological information. Source-language synonyms were derived automatically and ...
متن کاملSystem Description for IWSLT 2010
Our submission is a non-structural Example-Based Machine Translation system that translates text from Arabic to English, using a parallel corpus aligned at the paragraph / sentence level. Each new input sentence is fragmented into phrases and those phrases are matched to example patterns, using various levels of morphological information. Source-language synonyms were derived automatically and ...
متن کاملSplitting Long or Ill-formed Input for Robust Spoken-language Translation
This paper proposes an input-splitting method for translating spoken-language which includes many long or ill-formed expressions. The proposed method splits input into well-balanced translation units based on a semantic distance calculation. The splitting is performed during left-to-right parsing, and does not degrade translation e ciency. The complete translation result is formed by concatenat...
متن کامل